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Towards joint decoding of binary Tardos 
fingerprinting codes 

Peter Meerwald and Teddy Furon 
Abstract 

The class of joint decoder of probabilistic fingerprinting codes is of utmost importance in theoretical papers to 
establish the concept of fingerprint capacity (Tj-jS). However, no implementation supporting a large user base is 
known to date. This article presents an iterative decoder which is, as far as we are aware of, the first practical attempt 
towards joint decoding. The discriminative feature of the scores benefits on one hand from the side-information of 
previously accused users, and on the other hand, from recently introduced universal linear decoders for compound 
channels Q. Neither the code construction nor the decoder make precise assumptions about the collusion (size 
or strategy). The extension to incorporate soft outputs from the watermarking layer is straightforward. An extensive 
experimental work benchmarks the very good performance and offers a clear comparison with previous state-of-the-art 
decoders. 

Index Terms 

Traitor tracing, Tardos codes, fingerprinting, compound channel. 

I. Introduction 

Traitor tracing or active fingerprinting has witnessed a flurry of research efforts since the invention of the now 
well-celebrated Tardos codes |j5). The codes of G. Tardos are optimal in the sense that the code length m necessary 
to fulfill the following requirements (n users, c colluders, probability of accusing at least one innocent below Pfp) 
has the minimum scaling in fl{c^ log nPf^^). 

A first group of articles analyses such probabilistic fingerprinting codes from the viewpoint of information theory. 
They define the worst case attack a collusion of size c can produce, and also the best counter-attack. The main 
achievement is a saddle point theorem in the game between the colluders and the code designer which establishes 
the concept of fingerprinting capacity C(c) |[l]-|3j. Roughly speaking, for a maximum size of collusion c, the 
maximum number of users exponentially grows with m with an exponent equal to C(c), to guarantee vanishing 
probabilities of error asymptotically as the code length increases. Sec. |ll] summarizes these elements of information 
theory. 
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Our point of view is much more practical and signal processing oriented. Thanks to an appropriate watermarking 
technique, m bits have been hidden in the distributed copies. At the time a pirated version is discovered, the 
content has been distributed to n users. Our goal is to identify some colluders under the strict requirement that the 
probability of accusing innocents is below Pfp. It is clear that we are not in an asymptotic setup since m and n are 
fixed. The encoder and the decoder are not informed of the collusion size and its attack, therefore there is no clue 
whether the actual rate R = m^^ logj n is indeed below capacity C(c). 

A second group of research works deals with decoding algorithms. Here, a first difficulty is to compute user 
scores that are as discriminative as possible. A second difficulty is to set a threshold such that one can reliably 
accuse users who are part of the collusion. These two steps are not easy since the decoder does not know the size 



and the attack of the collusion. Sec. Ill sums up the past approaches which are mainly based on single decoders. 
It also motivates our decoder based on compound channel theory and the use of a rare event estimator. 

A third difficulty is to have a fast implementation of the accusation algorithm in order to face a large-scale set of 
users. A main advantage of some fingerprinting schemes based on error-correcting codes is to offer an accusation 
procedure with runtime polynomial in m In comparison, the well-known Tardos-Skoric single decoder is 

an exhaustive search of complexity 0{nm) jsj. Since in theory n can asymptotically be in the order of 2™^, 
decoding of Tardos codes might be intractable. Again, we do not consider such a theoretical setup, but we pay 
attention to maintain an affordable decoding complexity for orders of magnitude met in practical applications. 



Sec. IV focuses on the iterative architecture of our joint decoder based on three primitives: channel inference, 
score computation, and thresholding. Its iterative nature stems from two key ideas: i) the codeword of a newly 
accused user is integrated as a side information for the next iterations, ii) joint decoding is manageable on a short 



list of suspects. Sec.[V]provides an extension to soft decoding. In Sec. VI we present our experimental investigations 



with a comparison with related works for typical values of (m, n). This shows the benefit of our decoder: better 
decoding performance with acceptable runtime in practical scenarios. 

II. Tardos code and the collusion model 
We briefly review the construction and some known facts about Tardos codes. 

A. Construction 

The binary code is composed of n codewords of m bits. The codeword Xj = {xj{l), ■ ■ ■ ,Xj{m)Y' identifying 
user j E U = [n], where [n] := {!,... ,n}, is composed of ni binary symbols independently drawn at the code 
construction s.t. ¥{xj{i) = 1) = pi, Vi E [m]. At initialization, the auxiliary variables {pijl^i independent and 
identically drawn according to distribution f{p) : [0, 1] — > M+. Both the code S — [xi, . . . ,x„] and the auxiliary 
sequence p = {pi, ■ ■ ■ ,Pm)'^ must be kept as secret parameters. 

B. Collusion attack 

The collusion attack or collusion channel describes the way the c colluders C = {ji, . . . ,jc} merge their binary 
codewords Xj^ , . . . , Xj^ to forge the binary pirated sequence y. It is usually modelled as a memoryless discrete 
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multiple access channel, which is fair in the sense that all colluders participate equally in the forgery. This assumption 
comes from the fact that the worst case attacks are indeed memoryless for Tardos codes where symbols are generated 
independently, |9 Lemma 3.3]. Moreover, in a detect-many scenario, there is no hope in identifying almost idle 
colluders if the attack is not fair ||9] Lemma 3.2]. 

This leads to a 2 x (c + 1) probability transition matrix [P(F|<^)] where ^ = X^jec -^i ^ random variable 
counting the number of '1' the colluders received out of c symbols. A common parameter of the collusion attack 
on binary codes is denoted by the vector 9c = (^c(O), . . . ,0c{c))'^ with Oc{(p) — P{Y = 1\<P = (p). The usual 
working assumption, so-called marking assumption pO) , imposes that 9^(0) = 1 — Ode) — 0. The set of collusion 
attacks that c colluders can lead under the marking assumption is denoted by 8c: 

Qc^iee [0, ir+\e{o) = 1 - 0{c) = 0}. (1) 

Examples of attacks following this model are given, for instance, in fTTI. 

C. Accusation 

Denote A C U the set of users accused by the decoder. The probabiUty of false positive is defined by Pfp = 
P(.4 ^ C). In practice, a major requirement is to control this feature so that it is lower than a given significance 
level. 

In a detect-one scenario, A is either a singleton, or the empty set. A good decoder has a low probability of false 
negative defined by Pf^ = P{A = 0). In a detect-many scenario, several users are accused, and a possible figure of 
merit is the number of caught colluders: |ylnC|. In the literature, there exists a third scenario, so-called detect-all, 
where a false negative happens if at least one colluder is missed. This article only considers the first two scenarios. 

D. Guidelines from information theory 

This article does not pretend to any new theoretical contribution, but presents some recent elements to stress 
guidelines when designing our practical decoder. 

A single decoder computes a score per user. It accuses users whose score is above a threshold (detect-many 
scenario) or the user with the biggest score above the threshold (detect-one scenario). Under both scenarios and 
provided that the collusion is fair, the performance of such decoders is theoretically bounded by the achievable rate 
Oc) = /(X; Y\P., Oc) = Ep^f[I{X; Y\p, 9^)] [9. Th. 4.1]. A fundamental result is that, for a given collusion 
size c, there exists an equilibrium (/c,s,^c.s) to the max-min game between the colluders (who select 9) and the 
code designer (who selects /) as defined by max/minegec Rs{f,d) in |1, Th. 4]. 

A joint decoder computes a score per subset of ^ < c users and accuses the users belonging to subsets whose 
score is above a threshold or only the most likely guilty amongst these users. Under both scenarios and provided that 
the collusion is fair, the performance of such decoders is theoretically bounded by the achievable rate R,j{f. 9c) = 
e-'^I{^;Y\P,9c) = £~'^Ep^f[I{$;Y\p,9c)] ^ Th. 3.3]. ^ denotes the random variable sum of the subset user 
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symbols. Moreover, for a given collusion size c, there also exists an equilibrium (/c.j, ^c,j) to the max-min game 
maxf ming^e^ Rj{f,9) |1, Th. 4]. 

Asymptotically, as c — > +00, both fc,j and fc,s converge to /t(p) = l/(7r-\/p(l — p)), the distribution originally 
proposed by G. Tardos ||T] Cor 7], and both min^ Rjifx, and min^ Rsifr, ^) quickly approach the equilibrium 
value of the respective max-min game [1, Fig. 2]. Yet, the code designer needs to bet on a collusion size c' in order 
to use the optimal distribution fc'^s (or fc',j if the decoder is joint). Integer c' plays the role of a desired security 
level. 

Despite the division by £ in the expression of Rj{f, 0), it appears that i?s(/, ^) < Rji.fi d), V0 19", Eq. (3.4)]. 
This tells us that a joint decoder is theoretically more powerful than a single decoder However, a joint decoder 
needs to compute 0{n^) scores since there are (") subsets of size I. This complexity is absolutely intractable 
for large-scale applications even for a small £. This explains why, so far, joint decoders were only considered 
theoretically to derive fingerprinting capacity. Our idea is that there is no need to consider all these subsets since 
a vast majority is only composed of innocent users. Our decoder iteratively prunes out users deemed as innocents 
and considers the subsets over the small set of remaining suspects. 

This iterative strategy results in a decoder which is a mix of single and joint decoding. Unfortunately, it prevents 
us from taking advantage of the game theory theorems mentioned above. We cannot find the optimal distribution / 
and the worst collusion attack against our decoder. Nevertheless, our decoder works with any distribution / under 
some conditions stated in Sec. [Hi] For all these reasons, the experiments of Sec.|Vl]are done with the most common 
Tardos distribution /t. 

M. Fernandez and M. Soriano proposed an iterative accusation process of an error correcting code based 
fingerprinting scheme |[7|. Each iteration takes advantage of the codewords of colluders already identified in the 
previous iterations. The same idea is possible with Tardos probabilistic fingerprinting code. This is justified by the 
fact that the side information A, defined as the random variable sum of the already identified colluder symbols, 
increases the mutual information: /(^; Y\P, 9c) < /(^; Y\P^ 9c, A). Indeed, side information helps more than joint 
decoding as proved by (9] Eq. (3.3)]. 

The above guidelines can be summarized as follows: use the continuous Tardos distribution fx for code con- 
struction, integrate the codewords of accused users as side information and finally use a joint decoder on a short 
list of suspects. 

III. A SINGLE DECODER BASED ON COMPOUND CHANNEL THEORY AND RARE EVENT ANALYSIS 

This section first reviews some single decoders and presents new decoders based on compound channel theory 
and rare event analysis. The first difficulty is to compute a score per user such that the colluders are statistically well 
separated from the innocents scores. The second difficulty is to set a practical threshold such that the probability 
of false positive is under control. 
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Detection theory tells us that the score given by the Log-Likelihood Ratio (LLR): 

P(2/(i)|x,(z),0c) 

lUg 

1=1 

is optimally discriminative in the Neyman-Pearson sense to decide the guiltiness of user j. Yet, the LLR needs 
the knowledge of the true collusion attack 9c which prevents the use of this optimal single decoder in practical 
settings. Some papers proposed a so-called 'Learn and Match' strategy using the LLR score tuned on an estimation 



6 of the attack channel 1 11 . Unfortunately, a lack of identifiability obstructs a direct estimation from (y, p) (see 
Sec. |III-B| l. Indeed, the estimation is sound only if c is known, and if the number of different values taken by p is 
biggeiQor equal than c- 1: F{Y = l\0,p) is a polynomial in p of degree at most c (see ( [T4| i with u = and u = 0) 
going from point (0,0) to (1, 1), we need c — 1 more points to uniquely identify this polynomial. To overcome 
this lack of information about c, an Expectation-Maximization (E.-M.) approach has been proposed but it is not 



satisfactory since it does not scale well with the number of users 1 11 1. Moreover, the setting of the threshold was 
not addressed. 

On the other hand, there are decoders that do not adapt their score computation to the collusion. This is the 
case of the score computation originally proposed by G. Tardos fs*!, and later-on improved by B. Skoric et al. jsj. 
It has an invariance property: its statistics, up to the second order, do not depend on the collusion attack channel 
6, but only on the collusion size c p2) . Thanks to this invariance, whatever the collusion attack is, there exists a 
threshold t guaranteeing a probability of false positive below Pfp while keeping the false negative away from 1 
provided that the code is long enough, i.e. m — fl{c^ log nPf^^). However, there is a price to pay: the scores are 
clearly less discriminative than the LLR. 

Some theoretical papers |jT3j Sec. V] ||9] Sec. 5.2] promote another criterion, so-called 'universality', for the 
design of decoders. The performance (usually evaluated as the achievable rate or the error exponent) when facing a 
collusion channel 9c should not be lower than the performance against the worst attack 9*. In a sense, it is a clear 
warning to the 'Learn and Match' strategy. Suppose that 9c ^ 9* and that, for some reasons, the estimation of 
the collusion attack is of poor quality. In any case, a mismatch between 9 and 9c should not ruin the performance 
of the decoder to the point it is even lower than what is achievable under the worst attack 9*. The above cited 
references ||9j, (jTsj recommend the single universal decoder based on the empirical mutual information /(x;y|p) 
(or empirical equivocation for joint decoder). The setting of the threshold depends on the desired error exponent 
of the false positive rate. Therefore, it is valid only asymptotically. 

To summarize, there have been two approaches: adaptation or non-adaptation to the collusion process. The first 
class is not very well grounded since the estimation of the collusion is an issue and the impact of a mismatch 
has to be studied. The second approach is more reliable, but with a loss of discrimination power compared to the 
optimal LLR. The next sections presents two new decoders belonging to both approaches based on the compound 
channel theory. 

'This is tlie case in tliis article since we opt for tlie continuous Tardos distribution fx- 
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A. Some elements on compound channels 

Recently, in the setup of digital communication through compound channels, E. Abbe and L. Zheng |4| proposed 
universal decoders which are linear, i.e. in essence very simple. This section summarizes this theory and the next 
one proposes two applications for Tardos single decoders. 

A compound channel is a set S of channels, say discrete memoryless channels X E X ^ Y E y defined 
by their probability transition matrix Wg — [¥{Y\X,d)] parameterized by 6* e 6. The coder shares a code book 
S = {xj}"^]^ £ p^rnxn ^jj-j^ j-j^g dccodcr. Its coustructiou is assumed to be a random code realization from a 
provably good mass distribution Px- After receiving a channel output y £ y", a decoder computes a score per 
codeword Xj, j G [n], and yields the message associated with the codeword with the biggest score. The decoder is 
linear if the score has the following structure: 

m 

='^d{xj{i),y{i)), (3) 

i=l 

with d{-, ■) : X X y ^ R. For instance, score (|2|, so-called MAP decoder in digital communications Q, is linear 
with d(x, y) = log(P(y|x, 6)/V{y\9)). However, in the compound channel setup, the decoder does not know through 
which channel of S the codeword has been transmitted, and therefore it cannot rely on the MAP. 

We are especially interested in two results. First, if S is one-sided (see Def. [T] below), then the MAP decoder 
tuned on the worst channel Wg* is a linear universal decoder Lemma 5]. If 5 = Ufcli with K finite and 
Sk one-sided Vfc e [A'], then the following generalized linear decoder is universal Q Th. 1] and the score of a 
codeword is the maximum of the K MAP scores tuned on the worst channel Wg* of each St.- 

k 

V(y(i)\x,(i),0t) 

Si = max > log J;/' ^ . (4) 

Definition 1 (One-sided set, Def. 3 of [4]): A set S is one-sided with respect to an input distribution Px 
• if the following minimizer is unique: 

H^e* =arg mill I{Px,0), (5) 
0eci(e) 

with I{Px,9) the mutual information I{X;Y) with {X,Y) ^ Px o Wg (where P oW denotes the joint 
distribution with P the distribution of X and W the conditional distribution), and cl(9) the closure of 0, 
. and if, ye e 9, 

D{Px o Wg\\Px X Py,e.) >D{Px o Wg\\Px o Wg*) + 

(6) 

DiPxoWg*\\Px X Pym^). 

with -D( ||-) the Kullback-Leibler distance, Py.g the marginal of Y induced by Px o Wg, and Px x Py g the 
product of the marginals. 

B. Application to single Tardos decoders 

Contrary to the code construction phase, it is less critical at the decoding side to presume that the real collusion 
size c is less or equal to a given parameter Cmax- This parameter can be set to the largest number of colluders the 
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fingerprinting code can handle with a reasonable error probability knowing (m, n). Another argument is that this 
assumption is not definitive. If the decoding fails because the assumption does not hold true, nothing prevents us 
to re-launch decoding with a bigger Cmax- Let us assume c < Cmax in the sequel. 

A first application of the work Q is straightforward: The collusion channel belongs to the set as 
defined ([TJ, and thanks to |4, Lemma 4] each convex set is one-sided. According to |4, Th. 1], the decoder 
based on the following score is universal: 

where 9^ j.-^ — argminej. Rsifr, S), Vfc G [2, . . . , Cmax]- This decoder does not adapt its score computation to the 
collusion attack. 

The second application is more involved as the lack of identifiability turns to our advantage. The true collusion 
channel Oc has generated data y distributed as V{y\p,6c)- Let us define the class 8{0c) = {9\V{y\p^0) = 
V{y\p, Oc), y{y,p) e {0, 1} X [0, 1]}. Thanks to [14, Prop. 3], we know that £{0c) is not resti-icted to the singleton 
{6c} since for any c' > c there exists one 6c' E £{dc)- This holds especially for Cmax- Asymptotically with the 
code length, the consistent Maximum Likelihood Estimator (MLE) parameterized on Cmax, as defined in ( [T6] l, yields 
an estimation ©c^a^ ~ ^c^a^ G £{6c) with increasing accuracy. This estimation is not reliable because c ^ c^nax a 
priori. Therefore, we prefer to refer to 6c^^^ as a collusion inference rather than a collusion estimation, and the 
scoring uses this inference as follows: 

^, ny{i)\x,{i),6c^J 



log 1: - (8) 

i=l 



Suppose that the MLE tuned on Cmax provides a perfect inference 6c^,^ — 6c^,^, we then succeed to restrict 
the compound channel to the discrete set £c^^^{6c) which we define as the restriction of £{9c) to collusions of 
size c < Cmax- Appendix [a| shows that £c^,A^c) is one-sided, and its worst attack is indeed ^c™^. Lemma 5 
of Q justifies the use of the MAP decoder (|2]) tuned on ^c„ax- application leads to a more efficient decoder 
since Rsifr, 6c^^J > Rsifr, ^c,^„ fj,)- This decoder pertains to the approach based on score adaptation, with the 
noticeable advantages: it is better theoretically grounded and it is far less complex than the iterative E.-M. decoder 

of in)- 

Figure [T] illustrates the Receiver Operating Characteristics (ROC) per user for the single decoders discussed so 
far with rn — 512 and c = 5 colluders performing worst-case (i.e. minimizing RsifT,d) over Q^) and majority 
attack (6*5. maj = (0,0,0, 1, 1, 1)"^). For this figure, the false positive air) and the false negative /3(r) are defined 
per user as follows: 

air) = P(.s(xi,„y,p) >t), (9) 
/3(r) = P(s(x,,,y,p)<r), (10) 

where xinn is a random variable denoting the codeword of an innocent user and Xj^, the codeword of the first 
colluder The single decoder is tuned on the collusion inference dc^^^ (with c^ax = 8) and performs almost as 
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Fig. 1. ROC plot for several decoders; m = 512, c = 5, Cmax = 8. Single (Ml) is the decoder based on empirical mutual information j9]. 
Single (Compound) relates to jTJ, Single (MAP) is |2|, Single is the LLR on flc^ax H}' ™d Symm. Tardos is the symmetric version of the 
G. Tardos scores proposed by B. Skoric et al. in |8J. 



good as the MAP decoder having knowledge of 0. The ROC of the symmetric Tardos score is invariant w.r.t. the 
collusion attack. The generalized linear decoder of (jTJ denoted compound takes little advantage of the fact that 
the majority attack is much milder than the worst attack. For a fair comparison, the single decoder based on the 
empirical mutual information Q assumes a Tardos distribution uniformly quantized to 10 bins; better results (yet 
still below the single decoder) can be obtained when tuned to the optimal discrete distribution for c = 5 coUuders 

The similarities between compound channel and fingerprinting has been our main inspiration, however some 
differences prevent any claim of optimality. First, in the compound channel problem, there is a unique codeword 
that has been transmitted, whereas in fingerprinting, y is forged from c codewords like in a multiple access channel. 
Therefore, the derived single decoders are provably good for chasing a given colluder (detect-one scenario), but 
they might not be the best when looking for more coUuders (detect-many scenario). The second difference is that 
the decoder should give up when not confident enough rather than taking the risk of being wrong in accusing an 
innocent. The setting of a threshold is clearly missing for the moment. 
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C. Rare event analysis 

This section explains how we set a threshold t in accordance with the required Pfp thanks to a rare event analysis. 
Our approach is very different than p3) (9) ||2) ||5) where a theoretical development either finds a general threshold 
suitable when facing a collusion of size c, or equivalently, where it claims a reliable decision when the rate is below 
the capacity which depends on c. Our threshold does not need the value of c but it only holds for a given couple 
(p, y) and a known n. Once these are fixed, the scoring Sj — s(xj, y, p) is a deterministic function from {0, 1}™ 
to M. Since the codewords of the innocent users are i.i.d. and c <C n, we have: 

Pfp = l-(1-P(,s(xi,„y,p)>r))""^ 

(11) 

« n ■ P(s(xinn,y,p) > r). 

The number of possible codewords can be evaluated as the number of typical sequences, i.e. in the order of 
2m¥.p^flht(p)]^ with hb{p) the enti'opy in bits of a BernoulH random vaiiable B{p). Ep^f^[hb{p)] « 0.557 bits, 
which leads to a far bigger number of typical sequences than n (say m > 300 and n < 10* in practice). This 
shows that plenty of codewords have not been created when a pirate copy is found. Therefore, we consider them 
as occurrences of x\nn since we are sure that they have not participated in the forgery of y. The idea is then to 
estimate r s.t. P(s(x;nn, y, p) > t) — n^^Pfp thanks to a Monte Carlo simulation with newly created codewords. 

The difficulty lies in the order of magnitude. Some typical requirements are n w 10^ and Pfp = 10"'', hence the 
estimation of r corresponding to a probability as small as 10"^". This is not tractable with a basic Monte Carlo on 
a regular computer. However, the new estimator based on rare event analysis proposed in p6) performs remarkably 
fast within this range of magnitude. It produces f and a C-% confidence intervaj^ [T~,r+]. In our decoder, we 
compare the scores to t+ {i.e. a pessimistic estimate of r) to ensure a total false positive probability lower than 
Pfp. Last but not least, this approach works for any single decoder 

IV. Iterative, Joint decoding algorithm 
This section extends the single decoder based on the collusion inference 9c^^^ towards joint decoding, thanks 



to the guidelines of Sec. II-D Preliminary results about these key ideas were first presented in 1 17| and p8). A 



schematic overview of the iterative, joint decoder is shown in Fig. |2] 
A. Architecture 

The first principle is to iterate the score computation and include users accused in previous iterations as side- 
information to build a more discriminative test. Let Us\ = denote the initially empty set of accused users. In each 
iteration we aim at identifying a (possibly empty) set of users A = {j \ Us\\sj > r} and then update Us\ with 
A. 

Second, we additionally compute scores for subsets of t users ofU\Us\, t < Cmax- Obviously, there are (^I'^^'^^'l) 
such subsets. As n is large, enumerating and computing a score for each subset is intractable even for small t. The 

-We are C-% sure that the true r hes in this interval. 
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Fig. 2. Overview of tlie iterative, side-informed joint Tardos fingerprint decoder. 



idea here is to find a restricted set U^^^ C U\Us\ of n^*) = \U^^^ \ users that are the most likely to be guilty and keep 
p{t) _ approximately constant and within our computation resources. We gradually reduce n^^^ by pruning 

out users who are unlikely to be coUuder when going from single (t — 1) decoding, to pair (t = 2) decoding, etc. 
If n^*-) — 0{n^*), then score computation of t-subsets over the restricted user set is within 0{n) just like for the 
single decoder. 

Initially, the joint pair-decoder starts with the list of users ranked by the scores derived from the single decoder 
in decreasing order, i.e. the top-ranked user is most likely to be a colluder Later on, the joint t-subset decoder 
produces a new list of scores computed from subsets of t users which - according to theoretical results Q - are 
more discriminative as t increases. Denote T* C U^^^ the t-subset of users with the highest score. Our algorithm 
tries to accuse the most likely colluder within T*, and, if successful, updates Us\ and continues with the single 
decoder. If no accusation can be made, the algorithm generates a new list of suspects based on the ranking 

of joint scores that is fed to the subsequent t + 1 joint decoding stage. 

In the detect-one scenario, iteration stops after the first accusation. We restrict the subset size Xo t < imax. with 
^max = 5. This is not a severe limitation as for moderately large c, the decoding performance advantage of the 
joint decoder quickly vanishes |[9|. In the detect-many scenario, iteration stops when \Us\\ > Cmax or t reaches 
min(imax, Cmax ~ l^sil) ^nd no further accusation can be made. The set Us\ then contains the user indices to be 
accused. Alg. [T] illustrates the architecture of the accusation process for the catch-many scenario. 
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The next sections describe the score computation, the accusation of a user and the inference of the collusion 
process in more details. 

Algorithm 1 Iterative Joint Tardos Decoder. 
Require: y, S, p, c^ax, twax < c^ax, n'-*\ Pfp 
h U ^ <j< n}, Us\ ^ 
repeat 

1 

^c„,ax ^ infere(y,p,Z^si, 
W weights(y,p, ^c„„,Wsi) 
s <- scores(Z^ \Z^si,S, W) 
r threshold(p, W, n~^Pfp) 

{j eZY\Z^Si|sj > r} 
while ^ = and f < tmax do 
t <(- i + 1 

^ {jeU\Us\\sj > top(s,n(*))} 
W weights(y,p, ^c™,,Wsi) 
s scores(("f ),S,W) 
T <(- threshold(p, W, (;')"^Pfp,t) 

argmax S7- 
if sr» > T then 
for all j e T* and while ^ = do 

W ^ weight s(y,p,0e_,Z^Si U {mj}) 
r' <(- threshold(p, W,n"^Pfp) 
{i|score(j,S,W) > r'} 

end for 
end if 
end while 

until >1 = or |ZYsi | > Cmax 
return l{s\ 



B. Score computation 

For a f-subset T, the accusation is formulated as a hypothesis test based on the observations (y, {xj}jg7-) to 
decide between 'Ho (all j € T are iimocent) and 'Hi (all j e T are guilty). The score is just the LLR tuned on the 
inference ©cmax of the collusion process. 



January 20, 2013 



DRAFT 



IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 



12 



All these sequences are composed of independent random variables thanks to the code construction and the 
memoryless nature of the collusion. Moreover, the collusion only depends on the number of symbol ' 1 ' present in 
the codewords of a subset. Therefore, denote by 5 and (p the accumulated codewords corresponding to Us\ and T: 
S = J2jeUs\ ^ ^jeT^y ^^^^ Vi e [to], < 5{i) < ns\ and < Lp{i) < t. Thanks to the linear 

structure of the decoder, the score for a subset T of t users is simply 

m 

sr = ^W^(^W,z), (12) 

where the {t + 1) x m weight matrix W is pre-computed from (y, p) taking into account the side information Us\ 
so that i) e {0, . . . , i} X {1, . . . , to}: 

P(y«|(5(z),nsi),pW,0._) 
For indices s.t. y{i) = 1, both the numerator and the denominator share a generic formula, P{(p{i) + S{i),t + 

nsi,p(i)j ^c„ax) and P(S(i),ns\,p{i),6c^^J respectively, with 



fe=u (14) 
\ k — u ) 

In words, this expression gives the probability that y — \ knowing that the symbol '1' has been distributed to 
users with probability p, the collusion model Qc^^^, and the identity of v colluders who have u symbols '1' and 
V — u symbols '0'. For indices s.t. yii) = in ( [T3] l, the numerator and the denominator need to be 'mirrored': 

(P ^ 1 - P). 

At iterations based on the single decoder: t = 1 and = Xj for user j. If nobody has been deemed guilty 
so far, then — nsi = 0, Vi G [to]. This score is defined if t + nsi < c^ax- Therefore, for a given size of 
side-information, we cannot conceive a score for subsets of size bigger than Cmax ^ "^si- This implies that in the 
detect-many scenario, the maximal number of iterations depends on how fast Us\ grows. 

C. Ranking users within a subset and joint accusation 

Let denote the i-subset with the highest score. We accuse one user in only if S7-0 > r. Let 7[nn denote 
a subset composed of innocent users. Using rare event analysis, t is estimated s.t. P(s({xj}jg77^^, y, p) > t) — 
(") Pfp- This thresholding operation ensures that contains at least one colluder with a very high probability. 

In order to rank and accuse the most probable traitor in T*, we record for each user j e Z//*^*^ the subset leading 
to that user's highest score: 

7;^ = argmax{srlier}. (15) 
r 

We can count how often each user j appears in the recorded subsets {Tj'^jf^uw and denote this value aj. Finally, 
for a given T, the users jk G T can be arranged s.t. aj^ > ftja ^ ' ' ' ^ '^it establish a ranking of users per 
subset|3 







This detail is omitted in Alg. 1 but necessary for procedure t op ( ) . 
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Fig. 3. Attack channel and collusion model inference. 



To accuse a user j G T^, we check if the single score s(xj, y, p,Z^siU{T*\j}) > r' with r' s.t. P(s(xinn, y, p,Z^siU 
{T* \ j}) > t') = n-^Pfp. This method is suggested in ^ Sec. 5.3]. 

D. Inference of the collusion process 

The MLE is used to infer about the collusion process: 

^c_=arg max logP(y|p,Z^si, e)- (16) 

Whenever a user is deemed guilty, it is added to side-information and we re-run the parameter estimation to refine 
the collusion inference. 

V. Soft Decoding under AWGN attack 

The marking assumption is an unreaUstic restriction for traitor tracing with multimedia content as the coUuders 
are not limited to the copy-and-paste strategy for each symbol. They can merge the samples of their content versions 
(audio samples, pixels, DCT coefficients, etc.) in addition to traditional attempts to compromise the watermark. 
This may result in erroneously decoded symbols or erasures from the watermarking layer Relaxing the marking 



assumption leads to several approaches such as the combined digit model 1 19 1 |20 Sec. 4] and soft-decision decoding 



schemes pT), p2). This section extends the capabiUty of our joint decoder to this latter case, replacing the probability 



transition 2 x (c+ 1) matrix [P(y|^)] (see Sec. II-Bi by c + 1 probability density functions {6'c(j/|¥')}^=o- 

It is challenging if not impossible to exhibit a model encompassing all the merging attacks while being relevant for 
a majority of watermarking techniques. Our approach as sketched in Fig. [3] is pragmatic. The sequence y' e M™ 
is extracted from the pirated copy, with modulation y' {i) = 2y{i) — 1 if the signal is perfectly watermarked 
with binary symbol y{i). To reflect the merging attack, the coUuders forge values z{i) £ [—1, 1] and add noise: 
y'{i) — z{i) + n{i) with n{i) ^ A/'(0, fi^). This would be the case, for instance, for a spread spectrum watermarking 
where a symbol is embedded per block of content with an antipodal modulation of a secret carrier pTj , p3) . 

The coUuders have two strategies to agree on z. In a first strategy, they collude according to the marking 
assumption (i.e. they copy-and-paste one of their samples) and add noise: z e {—1, 1}™ and the probability that 
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z = 1 is given by the components of 6c- 

ei''\y'W) = {ec{^)e''^ + (1 - ec{^))e^^ d?) 

Except for ip e {0, c}, the pdfs have a priori two modes (hence the superscript //). This model is parameterized 

by {e,al). 

In a second strategy, the colluders select z{i) — e [—1, 1]: 

(y'-M(y)) ^ 



iy'\ip) = e -n^ /V2^,. (18) 

An equivalent of the marking assumption would impose that //(O) = —1 and /i(c) = 1. The pdfs have a unique 
mode (hence the superscript /). This model is parameterized by (p,, cr^). Fig. |4]gives some examples of such pdfs. 
A simple approach, termed hard decision decoding in the sequel, consists in first thresholding y' (to quantize 



y'{i) into if y'{i) < and 1 otherwise), and then employ the collusion process inference of Sec. IV-D on the 
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hard outputs. Our soft decision decoding method resorts to the noise-aware models ([TTjl and ([TSj and sets 



6>,_- argmax P(y|p,Z^si, e)- (19) 
Notice that models / and // share the same number of parameters, therefore, there is no risk of over-fitting. 



VI. Experimental Results 

We implemented the Tardos decoders in C-n-ij^ Single and joint score computation is implemented efficiently 
using pre-computed lookup tables, cf ([12]) and ( [T3] l, and aggregation techniques described in |17|. For a code 



length of m = 1024 more than 10^ single and about 10'' joint scores, respectively, can be computed per second 
on single core of a regular Intel Core2 2.6 GHz CPU. To control the runtime, the joint decoders are confined to 
5-subset decoding (imax = 5) and p^*) sa 4.5 • 10^ computed subsets per joint decoding stage. An iterative decoding 
experiment can be executed on a PC within a couple of minutes, given enough memory, see |18| for details. To 
experimentally verify the false-positive rate controlled by rare-event analysis, up to 3 - 10"* tests per parameter setting 
have been performed on a cluster of PCs. 

First, we first compare the performance of the proposed decoders under marking assumption. Finally, we lift this 
unrealistic restriction and turn to a more practical assessment using soft-decision decoding. 

Unless explicitly noted, the terms single and joint decoder refer to the decoders conditioned on the inference of 
the collusion process 9c^_^^, cf. (|8]l and ( [T2] i. Further, we consider the MAP decoders assuming knowledge of 6c and 
the compound channel decoder, cf. (|7]i, tuned on the worst-case attack O'l Vk £ [2, . . . , Cmax]- As a baseline for 
a performance comparison, we always include symmetric Tardos score computation ISl with a threshold controlled 



by rare-event analysis (see Sec. III-C i. 



A. Decoding performance under marking assumption 

1 ) Detect-one scenario: Here the aim is to catch at most one colluder - this is the tracing scenario most commonly 
considered in the literature. We compare our single and joint decoder performance against the results provided by 
Nuida et al. | [24) (which are the best as far as we know) and, as a second reference, the symmetric Tardos decoder. 

The experimental setup considers n — 10^ users and c e {2,3,4,6,8} colluders performing worst-case attack 
| [T4) against a single decoder In Fig. [3] we plot the empirical probability of error = Pfp + Pfn obtained by running 
lO** experiments for each setting versus the code length m. The false-positive error is controlled by thresholding 
based on rare-event simulation, Pfp = 10^^, which is confirmed experimentally. Evidently, for a given probability 
of error, the joint decoder succeeds in reducing the required code length over the single decoder, especially for 
larger collusions. 

Table |l] compares the code length to obtain an error rate of Pg ~ 10~^ for our proposed Tardos decoders and 
the symmetric Tardos decoder with the results reported by Nuida et al. i24| under marking assumption. Except for 



Source code is available at http://www.irisa.fr/texmex/people/furon/src.html 
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Fig. 5. Code length vs. Pe for n = 10^ users and different number of colluders performing worst-case attack against a single decoder; 

Cmax — 8. 

TABLE I 

Code length comparison for the detect-one scenario: n = lo", Worst-Case attack against a single decoder, 

Pe = 10-3. 



Colluders 


Nuida et al. 


Symm. 


Proposed (c 


max — 8) 


(c) 




Tardos 


Single 


Joint 


2 


253 


~ 416 


~ 368 


~ 304 


3 


877 


~ 864 


~ 776 


~ 584 


4 


1454 


~ 1472 


~ 1152 


~ 904 


6 


3640 


~ 2944 


~ 2304 


~ 1616 


8 


6815 


~ 5248 


~ 3712 


~ 2688 



c = 2, the proposed decoders can substantially reduce the required code length and the joint decoder improves the 
results of the single decoder Note that Nuida's results give analytic code length assuming a particular number of 
colluders for constructing the code while our results are experimental estimates based on worst-case attack against 



a single decoder and without knowing c (subject to c < c^ax = 8). Results with c known are provided in |18| 
and show a slightly better performance: the required code length of the joint decoder is then slightly shorter than 
Nuida's code in case c = 2. 

2) Detect-many scenario: We now consider the more realistic case where the code length m is fixed and the 
false-negative error rate is only a minor concerrj^ while the false-positive probability is critical to avoid an accusation 
of an innocent. The aim is to identify as many colluders as possible. 

tracing schemes rightly accusing a colluder half of the time might be enough to dissuade dishonest users. 
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Figures 6(a) 6(d) show the average number of identified colluders by different decoding approaches. The exper- 
imental setup considers n ~ 10^ users, code length m = 2048, and several collusion attacks (worst-case attacks, 
i.e. minimizing the achievable rate of a single or joint decoder, interleaving and majority which is a rather mild 
attack) carried out by two to eight colluders. The global probability of a false positive error is fixed to Pfp = 10^^. 

As expected, the MAP single decoder knowing 9c provides the best decoding performance amongst the single 
decoders, yet is unobtainable in practice. The symmetric Tardos decoder performs poorly but evenly against all 
attacks; the single decoder based on the compound channel (|7| improves the results only slightly. 

The joint decoders consistently achieve to identify most colluders - with a dramatic margin in case the traitors 
choose the worst-case attack against a single decoder. This attack bothers the very first step of our decoder, but 
as soon as some side information is available or a joint decoder is used, this is no longer the worst case attack. 
Finding the worst case attack against our iterative decoder is indeed difficult. A good guess is the interleaving 
attack which is asymptotically the worst case against the joint decoder |[l]. The experiments show that it reduces 
the performance of the joint decoders substantially for large c. 

The decoder based on the inference 6c^_^^ and the true MAP are different when c is lower than Cmax- However, 
this is not a big deal in practice for a fixed m: for small c, the code is long enough to face the collusion even if the 
score is less discriminative than the ideal MAP; for big c the score of our decoder gets closer to the ideal MAP. 

B. Decoding performance of the soft decoder 

We assess the performance of the soft decision decoders proposed in Sec. |V] in two tracing scenarios: (i) 



Kuribayashi considers in |21| n = 10 users and code length m = 10 , (ii) a large-scale setup with 33 554432 
users and m — 7 440 where Jourdas and Moulin p3j provide results for their high-rate random-like fingerprinting 
code under averaging and interleaving attack. 

In Fig. |7j we compare the average number of identified colluders for the single and joint decoder using different 

estimates of the collusion process: hard relates to decoders using hard thresholding and 9c^^^ while soft identifies 

"(I) -ill) 

the noise-aware decoders relying on 6c„,_^ oi" ^c„ax chosen adaptively based on the likelihood of the two models. All 
plots also show the results for the (hard-thresholding) symmetric Tardos decoder. The false-positive rate is set to 
10^"*. Extensive experiments (3 • 10* test runs) have been carried out to validate the accusation threshold obtained 
by rare-event simulation. As expected, soft decoding offers substantial gains in decoding performance. The margin 
between the single and joint decoders depends on the collusion strategy. Dramatic improvements can be seen when 
the collusion chooses the worst-case attack against a single decoder, cf Fig. |7ja). On the other hand, the gain is 
negligible when averaging is performed. 

Note that the attacks in (a)-(c) pertain to the pick-and paste attacks while Fig. [Tj^d) shows the linear averaging 
attack. 



Comparison with the results provided in 1 21 1 for the majority attack is difficult: (i) they were obtained for Nuida's 
discrete code construction | [24| tuned on c ~ 7 colluders, and (ii) the false-positive rate of |2T| does not seem to be 
under control for the symmetric Tardos code. We suggest to use the hard symmetric Tardos decoder 18) as a baseline 
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for performance comparison. By replacing the accusation thresholds proposed in pT| with a rare-event simulation, 
we are able to fix the false-alarm rate in case of the symmetric Tardos code. Furthermore, the decoding results 
given in | [2T| for the discrete variant of the fingerprinting code (i.e. Nuida's construction) could be significantly 
improved by rare-event simulation based thresholding. Contrary to the claim of pT) , soft decision decoding always 
provides a performance benefit over the hard decoders. 

In Fig. |8] we illustrate the decoding performance when deaUng with a large user base. We consider averaging 
and interleaving attacks by c = 2, ... ,12 and c — 2, ... ,8 colluders (cmax — 12 and Cmax — 8, respectively) 
followed by AWGN with variance = 1. The global false-positive rate is set up to 10^"^. The benefit of the 
soft decoding approach in clearly evident. Joint decoding provides only a very limited increase in the number of 
identified colluders. For comparison, Jourdas & Moulin indicate an error rate of — 0.0074 for c — 10 colluders 
in the first, and Pg = 0.004 for c ~ 5 colluders in the second setting for a detect-one scenario |j23). 



In 1 25 1, Pfp = 0.0016 and Pfn = 0.044 are given for the first experiment (Fig. 8(a) i by introducing a threshold to 
control the false-positive rate. Our soft joint decoder achieves a Pfn = 0.046 for Pfp = 10^^ (for c = 10 colluders), 
catching 2.6 traitors on average. 



In the second experiment (see Fig. 8(b) i, our joint decoder compares more favorably: with the given code length, 
all c = 5 colluders can be identified and for a collusion size c = 8, 4.5 traitors are accused without observing any 
decoding failure in 3 • 10'^ tests. 



C. Runtime Analysis 

Single decoding can be efficiently implemented to compute more than one million scores for a code of length 
m — 1024 per second. Its complexity is in 0{n ■ m). Selecting the p*^*^ most likely guilty users can be efficiently 
done with the max-heap algorithm. Yet, it consumes a substantial parts of the runtime for small m. The runtime 
contribution of the joint decoding stage clearly depends on the size of pruned list of suspects, 0{m ■ p^^^) and is 
independent of the subset size t thanks to the revolving door enumeration method of the subset^ Restricting p'*' 
and trnax kccps the joint decoding approach computationally tractable. Better decoding performance can be obtained 
using higher values at the cost of a substantial increase in runtime. Experiments have shown that even the moderate 
settings (p^*) w 4.5 • 10® and i^ax = 5) achieve a considerable gain of the joint over the single decoder for several 
collusion channels. 

Thresholding accounts for more than half of the runtime in the experimental setups investigated in this work. 
However, this is not a serious issue for applications with a large user base or when p'*' becomes large. Thresholding 
depends on the subset size t because a large number of random codeword combinations must be generated and 
because we seek lower probability level in 0{Pfp/n^). Therefore, the complexity is in 0{m ■ P ■ log(n)) according 
to p6) . There are no more than c^ax such iteration with t < c^ax, so that the global complexity of our decoder 
stays in 0(m log(n)). 

*In each step ip is updated by replacing one user's codeword. See jl8| for details. 



Januaiy 20, 2013 



DRAFT 



IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 



19 



More details about the runtime are given in flS) . Note that results have been obtained with a single CPU core 
although a parallel implementation can be easily achieved. 

VII. Conclusion 

Decoding probabiUstic fingerprinting codes in practice means to trace guilty persons over a large set of users while 
having no information about the size nor the strategy of the collusion. This must be done reliably by guaranteeing 
a controlled probability of false alarm. 

Our decoder implements provably good concepts of information theory (joint decoding, side information, linear 
decoder for compound channels) and statistics (estimation of extreme quantile of a rare event). Its extension to soft 
output decoding is straightforward as its does not change its architecture. 

Since the proposed iterative method is neither just a single decoder nor completely a joint decoder (it only 
considers subsets over a short list of suspects), it is rather difficult to find the best distribution for code construction 
and its worst case attack. Experiments show that the interleaving attack is indeed more dangerous than the worst-case 
attack against a single decoder. 

Appendix 

We prove that £c_(0c) = < c^ax, P(2;b, ©fe) = P(z/b,6'c), y{y,p) G {0,1} x [0,1]} is one sided. The 

collusion channels of this set share the property that P{Y — l\p, O/.) — q{p) > 0, Vp G [0, 1]. From |14, Eq. (20)]: 

¥{¥ = l\X = l,p,~9k) = q{p) + k-\l-p)q'{p) (20) 
P(y- l|X = O,p,0fc) - q{p)-k-'pq'{p) (21) 

Take {6k^,9kg) € £c^_^^{6cY s.t. /c^ < /cs. We first show that R{fT,dkA) > ^(/ti^a-b) so that the minimizer 
of R{fT,6) over £c^^^{dc) is indeed ^cmax- Denote by (/ii,/i2) the following conditional probability distributions: 

fii{y,x\p) = ¥{¥ = y\p) = qip)y{l-q{p))^'-y^ (22) 
^l2iy,x\p) = V{Y = y\X = x,p,ekA). (23) 

Then, V{Y\X,p,9k^) = (1 - X)^xi{Y,X\p) + A/i2(r,X|p), Vp G [0,1], with A = /c^/fcs < 1. The mutual 
information is a convex function of P(y|X, p) for fixed P(X||7) so that, once integrated over /t(p), we have 

i?(/T,efcJ< (l-A)-O + A-i?(/T,0fe^) <i?(/T,^fcJ. (24) 
We now prove that (|6| holds \/9 E ^c^^A^c)- This is equivalent to 

i?(/T,0fc) - D{nY,X\eu)\\nY.X\e,_)) - RifT,0c^J > 0, (25) 



Januaiy 20, 2013 



DRAFT 



IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 



20 



where the LHS is of the form Ep^f^[g{P)]. After developing the expressions, we find that: 

g{p) = (fc-l-Op(l-p). 

^ 1 -pq'jp) 

Cmax qip) 

p q'jp) 

Cmax 1 - 

1-p q'{p) 
c, 



q\p)\og{l 

q'{p) log 

q'{p)\0g{l 



max 



1 - qip) 



g»log( 1-^44)) (26) 



Cmax qip) 

The four terms inside parenthesis are not negative because, with 7 > 0, a::log(l + jx) > for x > —7^^. Since 
k < Cmax, we obtain gip) > 0, whence (|6]l. 
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